Service Integrations
This document describes the Service Integration layer that encapsulates business logic for interacting with external systems. It explains how service classes abstract integrations for GitHub repository analysis and crawling, Gmail email management, Calendar scheduling, YouTube video content processing, academic portal access for the JIIT web portal, and general website analysis. It also documents request/response handling, error management, authentication mechanisms, rate-limiting considerations, data transformation processes, security and privacy practices, and performance optimization strategies for external API interactions. Examples of usage, integration patterns, and extension points for adding new services are included.
The Service Integration layer is organized around:
Services: Thin orchestration classes that call tools and prompts, handle errors, and transform data.
Routers: FastAPI endpoints that validate requests, enforce authentication, and delegate to services.
Tools: Low-level adapters to external APIs or utilities (e.g., GitHub crawler, YouTube transcript fetcher, website scraper).
Prompts: LLM chains/pipelines that structure prompts and orchestrate model calls.
Models: Request/response DTOs for routers.
Core: Logging, configuration, and LLM model configuration.
Diagram sources
Section sources
GitHubService: Converts a GitHub repository into a structured markdown representation, then uses a prompt chain to answer questions. Supports optional file attachments via Google GenAI SDK.
GmailService: Provides list/unread, latest messages, mark read, and send operations using an access token.
CalendarService: Lists calendar events and creates events using an access token.
YouTubeService: Generates answers using video metadata and transcripts, optionally with file attachments via Google GenAI SDK.
PyjiitService: Handles login, semester discovery, and attendance retrieval for the JIIT web portal, with a hardcoded semester mapping.
WebsiteService: Fetches remote content via a server-side scraper and augments with client-side HTML-to-markdown conversion, then uses a prompt chain to answer questions.
BrowserUseService: Generates a JSON action plan for automating browser tasks based on goals and DOM structure.
GoogleSearchService: Executes a web search pipeline and returns results.
ReactAgentService: Orchestrates a multi-modal reactive agent with optional file uploads and context injection.
Section sources
The Service Integration layer follows a service-oriented architecture:
Routers validate and sanitize requests, enforce authentication, and pass typed models to services.
Services encapsulate business logic, orchestrate tools and prompts, and manage error handling.
Tools abstract external API calls and data transformations.
Prompts define LLM chains for natural language processing tasks.
Core provides logging, configuration, and LLM model configuration.
Diagram sources
GitHub Integration#
Purpose: Convert a GitHub repository into a structured markdown representation and answer questions using an LLM prompt chain.
Key steps:
Repository ingestion via a GitHub crawler tool.
Optional file attachment processing via Google GenAI SDK.
Prompt chain invocation with summary, tree, content, and chat history.
Error handling for invalid URLs, access issues, and token limits.
Authentication: No direct authentication required for public repositories; private repositories require appropriate permissions outside this service.
Rate limiting: Subject to GitHub API quotas; consider caching and pagination where applicable.
Security: Avoid exposing sensitive data; sanitize inputs and limit context size.
Diagram sources
Section sources
Gmail Integration#
Purpose: Manage emails via Gmail API using an OAuth access token.
Operations:
List unread messages with configurable max results.
Fetch latest messages with configurable max results.
Mark a message as read.
Send an email.
Authentication: Requires a valid access token.
Rate limiting: Subject to Gmail API quotas; batch operations where possible.
Security: Store tokens securely; avoid logging sensitive data.
Diagram sources
Section sources
Calendar Integration#
Purpose: Interact with Google Calendar using an OAuth access token.
Operations:
List upcoming events with configurable max results.
Create an event with summary, start/end times, and description.
Authentication: Requires a valid access token.
Rate limiting: Subject to Calendar API quotas; throttle requests.
Security: Protect tokens; validate ISO 8601 timestamps.
Diagram sources
Section sources
YouTube Processing#
Purpose: Answer questions about YouTube videos using metadata and transcripts, optionally with file attachments.
Key steps:
Optional transcript extraction.
Optional file attachment via Google GenAI SDK.
Prompt chain invocation with URL, question, and chat history.
Authentication: No direct authentication required for public videos; private videos may require access.
Rate limiting: Subject to YouTube and external provider quotas; cache transcripts.
Security: Avoid leaking sensitive context; sanitize inputs.
Diagram sources
Section sources
Academic Portal Access (PyJIIT)#
Purpose: Authenticate and retrieve academic data from the JIIT web portal.
Operations:
Login to establish a session.
Discover registered semesters.
Retrieve attendance for a specific or default semester (hardcoded mapping).
Authentication: Username/password login; session payload used for subsequent requests.
Rate limiting: Subject to portal rate limits; avoid frequent polling.
Security: Protect credentials and session payloads; avoid logging sensitive data.
Diagram sources
Section sources
Website Analysis#
Purpose: Answer questions about a given URL using server-side scraping and optional client-side HTML augmentation.
Key steps:
Server-side markdown generation via a web scraper.
Optional client-side HTML-to-markdown conversion.
Optional file attachment via Google GenAI SDK.
Prompt chain invocation with combined context.
Authentication: No authentication required for public websites.
Rate limiting: Respect robots.txt and site policies; consider caching.
Security: Sanitize HTML and avoid leaking internal context.
Diagram sources
Section sources
Browser Automation Script Generation#
Purpose: Generate a JSON action plan for automating browser tasks based on a goal and DOM structure.
Key steps:
Format DOM info and constraints into a prompt.
Invoke an LLM chain to produce a JSON action plan.
Sanitize and validate the resulting JSON.
Authentication: Not applicable.
Rate limiting: Not applicable.
Security: Validate and sanitize generated JSON to prevent unsafe actions.
Diagram sources
Section sources
Google Search#
Purpose: Execute a web search pipeline and return results.
Authentication: Not applicable.
Rate limiting: Subject to search provider quotas; implement client-side throttling.
Security: Avoid exposing sensitive queries; sanitize results.
Section sources
React Agent#
Purpose: Orchestrate a multi-modal reactive agent with optional file uploads and context injection.
Key steps:
Optional file upload via Google GenAI SDK.
Optional client HTML-to-markdown context injection.
Build a message list with chat history and human question.
Invoke a graph-based agent to produce a final answer.
Authentication: Not applicable for file uploads; external integrations may require tokens.
Rate limiting: Subject to GenAI and external provider quotas.
Security: Sanitize inputs and avoid leaking context.
Diagram sources
Section sources
Cohesion: Each service encapsulates a single domain (GitHub, Gmail, Calendar, YouTube, PyJIIT, Website, Browser automation, Search, React Agent).
Coupling: Services depend on tools and prompts; routers depend on services; minimal cross-service coupling.
External dependencies: Google GenAI SDK, external APIs (GitHub, Gmail, Calendar, YouTube, JIIT portal), web scrapers.
Circular dependencies: None observed among services and routers.
Diagram sources
Section sources
External API quotas: Implement retry with exponential backoff and circuit breakers for external providers.
Payload limits: Truncate or summarize large contexts; use streaming where supported.
Caching: Cache frequently accessed data (e.g., YouTube transcripts, website metadata) with appropriate TTLs.
Concurrency: Use async/await for I/O-bound operations; limit concurrent external calls.
Model costs: Prefer smaller models for routine tasks; reserve larger models for complex reasoning.
Network latency: Batch requests where possible; pre-warm connections.
GitHub repository access:
Invalid URL or 404: Ensure the repository URL points to the repository root and is public or accessible.
PathKind errors: Verify the URL format.
LLM token limits:
Context window exceeded: Reduce input size or ask focused questions.
Gmail/Calendar:
Missing access token: Ensure the token is provided and valid.
Invalid time formats: Confirm ISO 8601 strings for start/end times.
YouTube:
Transcript fetch failures: Transcripts may not be available; fallback to video metadata.
PyJIIT:
Login failures: Verify credentials; check session payload validity.
Hardcoded semester mapping: Ensure the target semester exists in the mapping.
React Agent:
Validation errors: Inspect sanitized JSON and adjust prompts or constraints.
Section sources
The Service Integration layer cleanly separates concerns between routing, business logic, tooling, and prompting. It supports robust integrations with GitHub, Gmail, Calendar, YouTube, the JIIT academic portal, and general website analysis. By centralizing error handling, authentication, and data transformation, it enables maintainable extensions and consistent behavior across external integrations.
Request/Response Handling Patterns#
Routers validate inputs, enforce authentication, and return standardized responses.
Services encapsulate retries, logging, and error translation.
Tools abstract external API specifics and return structured data.
Prompts standardize LLM interactions and context formatting.
Section sources
Authentication Mechanisms#
OAuth access tokens for Gmail and Calendar.
Username/password for PyJIIT login; session payloads for subsequent operations.
No authentication for public GitHub repositories and general website analysis.
Section sources
API Rate Limiting Considerations#
Implement client-side rate limiting and exponential backoff.
Use caching to reduce repeated calls.
Monitor provider quotas and alert on near-threshold usage.
Data Transformation Processes#
GitHub: Repository to markdown summary, tree, and content.
YouTube: Transcript extraction and optional file uploads.
Website: Server-side scraping and client-side HTML-to-markdown conversion.
PyJIIT: Normalization of attendance data and subject codes.
Section sources
Security and Privacy#
Avoid logging sensitive data (tokens, credentials, personal info).
Sanitize inputs and outputs; validate JSON action plans.
Use HTTPS and secure storage for tokens and session payloads.
Extension Points for New Services#
Create a new service class under services/.
Define request/response models under models/.
Implement routers under routers/.
Add tools under tools/ for external integrations.
Integrate prompts under prompts/ for LLM-driven workflows.
Wire dependencies using FastAPI Depends and ensure proper error handling.